Bienvenido a la primera práctica de Jupyter Notebook

Esta primera práctica se realizará sobre los datos obtenidos de la página http://insideairbnb.com/get-the-data.html

Para cargar librerías

Antes de realizar un análisis, se debe verificar si se tienen las librerías instaladas, si no se tiene alguna librerìa, se debe proceder a instalarlar.

In [2]:
import os, sys
import numpy as np
import pandas as pd
import pandas_profiling   ####Verificar que esté instalada. Si no está instalada, se puede utilziar el comando conda install -c conda-forge pandas-profiling --y

Configuración del directorio

In [4]:
%cd C:\Users\USUARIO\Documents\Monitoria_cda_2019_2\Clases_Python\Clase1_Python_ams    
C:\Users\USUARIO\Documents\Monitoria_cda_2019_2\Clases_Python\Clase1_Python_ams

Lectura de los datos

In [7]:
#data= pd.read_csv('listings.csv', sep=',',  error_bad_lines=False)

data=pd.read_csv('listings.csv')
In [9]:
##Para seleccionar una primera vista de la tabla
data.head()
Out[9]:
id listing_url scrape_id last_scraped name summary space description experiences_offered neighborhood_overview ... instant_bookable is_business_travel_ready cancellation_policy require_guest_profile_picture require_guest_phone_verification calculated_host_listings_count calculated_host_listings_count_entire_homes calculated_host_listings_count_private_rooms calculated_host_listings_count_shared_rooms reviews_per_month
0 2818 https://www.airbnb.com/rooms/2818 20190708161114 2019-07-09 Quiet Garden View Room & Super Fast WiFi Quiet Garden View Room & Super Fast WiFi I'm renting a bedroom (room overlooking the ga... Quiet Garden View Room & Super Fast WiFi I'm r... none Indische Buurt ("Indies Neighborhood") is a ne... ... t f strict_14_with_grace_period f f 1 0 1 0 2.09
1 20168 https://www.airbnb.com/rooms/20168 20190708161114 2019-07-09 Studio with private bathroom in the centre 1 Cozy studio on your own private floor, 100% in... For those who like all facets of city life. In... Cozy studio on your own private floor, 100% in... none Located just in between famous central canals.... ... f f strict_14_with_grace_period f f 2 0 2 0 2.45
2 25428 https://www.airbnb.com/rooms/25428 20190708161114 2019-07-09 Lovely apt in City Centre (w.lift) near Jordaan NaN This nicely furnished, newly renovated apt is... This nicely furnished, newly renovated apt is... none NaN ... f f strict_14_with_grace_period f f 2 2 0 0 0.17
3 27886 https://www.airbnb.com/rooms/27886 20190708161114 2019-07-09 Romantic, stylish B&B houseboat in canal district Stylish and romantic houseboat on fantastic hi... For a romantic couple: A beautifully restored ... Stylish and romantic houseboat on fantastic hi... none Central, quiet, safe, clean and beautiful. ... t f strict_14_with_grace_period f f 1 0 1 0 2.14
4 28871 https://www.airbnb.com/rooms/28871 20190708161114 2019-07-09 Comfortable double room NaN In a monumental house right in the center of A... In a monumental house right in the center of A... none NaN ... f f moderate f f 3 0 3 0 2.56

5 rows × 106 columns

Como se puede observar en la vista anterior, la primera columna no tiene nombre, esto es por que el data frame de pandas asigna un índice, el cual nos puede ayudar a realizar algunas consultas, por ejemplo, suponga que se va seleccionar la primera fila, es decir la fila asociada al índice 0. para tal fin, se utiliza df.loc[a:b, "column name"], donde a y b corresponden al rango del índice de las filas y column name el nombre de la variable a seleccionar, si se desea seleccionar todas las variables, se agrega el comando :

In [32]:
data.loc[0:0, : ]
Out[32]:
id listing_url scrape_id last_scraped name summary space description experiences_offered neighborhood_overview ... instant_bookable is_business_travel_ready cancellation_policy require_guest_profile_picture require_guest_phone_verification calculated_host_listings_count calculated_host_listings_count_entire_homes calculated_host_listings_count_private_rooms calculated_host_listings_count_shared_rooms reviews_per_month
0 2818 https://www.airbnb.com/rooms/2818 20190708161114 2019-07-09 Quiet Garden View Room & Super Fast WiFi Quiet Garden View Room & Super Fast WiFi I'm renting a bedroom (room overlooking the ga... Quiet Garden View Room & Super Fast WiFi I'm r... none Indische Buurt ("Indies Neighborhood") is a ne... ... t f strict_14_with_grace_period f f 1 0 1 0 2.09

1 rows × 106 columns

Ahora, ¿qué pasaría si no consideraramos 0:0, y por el contrario, tomamos sólo 0.

In [33]:
data.loc[0, : ]
Out[33]:
id                                                                                           2818
listing_url                                                     https://www.airbnb.com/rooms/2818
scrape_id                                                                          20190708161114
last_scraped                                                                           2019-07-09
name                                                     Quiet Garden View Room & Super Fast WiFi
summary                                                  Quiet Garden View Room & Super Fast WiFi
space                                           I'm renting a bedroom (room overlooking the ga...
description                                     Quiet Garden View Room & Super Fast WiFi I'm r...
experiences_offered                                                                          none
neighborhood_overview                           Indische Buurt ("Indies Neighborhood") is a ne...
notes                                           From week 38 to week 47 maintenance work to th...
transit                                         The neighbourhood is well served by 24 hours p...
access                                                                                        NaN
interaction                                                                                   NaN
house_rules                                     Please: - Leave your shoes in the entrance  - ...
thumbnail_url                                                                                 NaN
medium_url                                                                                    NaN
picture_url                                     https://a0.muscache.com/im/pictures/10272854/8...
xl_picture_url                                                                                NaN
host_id                                                                                      3159
host_url                                                   https://www.airbnb.com/users/show/3159
host_name                                                                                  Daniel
host_since                                                                             2008-09-24
host_location                                           Amsterdam, Noord-Holland, The Netherlands
host_about                                      Upon arriving in Amsterdam, one can imagine as...
host_response_time                                                                 within an hour
host_response_rate                                                                           100%
host_acceptance_rate                                                                          NaN
host_is_superhost                                                                               t
host_thumbnail_url                              https://a0.muscache.com/im/users/3159/profile_...
                                                                      ...                        
has_availability                                                                                t
availability_30                                                                                 5
availability_60                                                                                21
availability_90                                                                                35
availability_365                                                                              107
calendar_last_scraped                                                                  2019-07-09
number_of_reviews                                                                             262
number_of_reviews_ltm                                                                          27
first_review                                                                           2009-03-30
last_review                                                                            2019-06-28
review_scores_rating                                                                           98
review_scores_accuracy                                                                         10
review_scores_cleanliness                                                                      10
review_scores_checkin                                                                          10
review_scores_communication                                                                    10
review_scores_location                                                                          9
review_scores_value                                                                            10
requires_license                                                                                f
license                                                                                       NaN
jurisdiction_names                               {Amsterdam," NL Zip Codes 2"," Amsterdam"," NL"}
instant_bookable                                                                                t
is_business_travel_ready                                                                        f
cancellation_policy                                                   strict_14_with_grace_period
require_guest_profile_picture                                                                   f
require_guest_phone_verification                                                                f
calculated_host_listings_count                                                                  1
calculated_host_listings_count_entire_homes                                                     0
calculated_host_listings_count_private_rooms                                                    1
calculated_host_listings_count_shared_rooms                                                     0
reviews_per_month                                                                            2.09
Name: 0, Length: 106, dtype: object

Suponga ahora, que se va seleccionar las filas asociadas a las filas 0,1 y 2.

In [34]:
data.loc[0:2, : ]
Out[34]:
id listing_url scrape_id last_scraped name summary space description experiences_offered neighborhood_overview ... instant_bookable is_business_travel_ready cancellation_policy require_guest_profile_picture require_guest_phone_verification calculated_host_listings_count calculated_host_listings_count_entire_homes calculated_host_listings_count_private_rooms calculated_host_listings_count_shared_rooms reviews_per_month
0 2818 https://www.airbnb.com/rooms/2818 20190708161114 2019-07-09 Quiet Garden View Room & Super Fast WiFi Quiet Garden View Room & Super Fast WiFi I'm renting a bedroom (room overlooking the ga... Quiet Garden View Room & Super Fast WiFi I'm r... none Indische Buurt ("Indies Neighborhood") is a ne... ... t f strict_14_with_grace_period f f 1 0 1 0 2.09
1 20168 https://www.airbnb.com/rooms/20168 20190708161114 2019-07-09 Studio with private bathroom in the centre 1 Cozy studio on your own private floor, 100% in... For those who like all facets of city life. In... Cozy studio on your own private floor, 100% in... none Located just in between famous central canals.... ... f f strict_14_with_grace_period f f 2 0 2 0 2.45
2 25428 https://www.airbnb.com/rooms/25428 20190708161114 2019-07-09 Lovely apt in City Centre (w.lift) near Jordaan NaN This nicely furnished, newly renovated apt is... This nicely furnished, newly renovated apt is... none NaN ... f f strict_14_with_grace_period f f 2 2 0 0 0.17

3 rows × 106 columns

Ahora, se va a seleccionar únicamente la variable "name" de las tres primeras filas, es decir, de los índices 0, 1 y 2.

In [45]:
data.loc[0:2, 'name']
Out[45]:
0           Quiet Garden View Room & Super Fast WiFi
1       Studio with private bathroom in the centre 1
2    Lovely apt in City Centre (w.lift) near Jordaan
Name: name, dtype: object

A continuación, se va seleccionar las variables "name" y "summary" y las tres primeras filas, es decir, los índices 0, 1 y 2.

In [43]:
data.loc[0:2, ['name', 'summary']]
Out[43]:
name summary
0 Quiet Garden View Room & Super Fast WiFi Quiet Garden View Room & Super Fast WiFi
1 Studio with private bathroom in the centre 1 Cozy studio on your own private floor, 100% in...
2 Lovely apt in City Centre (w.lift) near Jordaan NaN

Finalmente, se va seleccionar únicamente la variable "name", y se considerarán todas las filas.

In [46]:
data.loc[:, 'name']
Out[46]:
0                 Quiet Garden View Room & Super Fast WiFi
1             Studio with private bathroom in the centre 1
2          Lovely apt in City Centre (w.lift) near Jordaan
3        Romantic, stylish B&B houseboat in canal district
4                                  Comfortable double room
5                                  Comfortable single room
6                      2-story apartment + rooftop terrace
7                      Nice and quiet place in the Jordaan
8                        Amsterdam Center Entire Apartment
9              Comfortable room@PERFECT location + 2 bikes
10                        Oasis in the middle of Amsterdam
11       View into park / museum district (long/short s...
12                      Amsterdam Centre, 3-room Apartment
13                          Cozy loft in central Amsterdam
14                        Charming apartment in old centre
15                      Amsterdam Central and lot of space
16            Multatuli Luxury Guest Suite in top location
17                     Perfect central Amsterdam apartment
18                      B & B de 9 Straatjes (city center)
19                     Bright Apartment - residential area
20                                          Amsterdam Aqua
21                Green studio at the attic of a townhouse
22                Nice room near centre with en suite bath
23         Large quiet Studio with gardenview in hip area.
24                                   Luminous central room
25                  Fully equiped house, PIJP area = great
26                        groundfloor apartment with patio
27                         Bright Loft in Centre Amsterdam
28                        Greatly located, cozy atmosphere
29                     Apartment near Museumplein (centre)
                               ...                        
20307             Stunning & Spacious Apartment in De Pijp
20308        Nordic design apt.  canal view & roof terrace
20309    Sunny 2 bedroom apartment (70m2), with water v...
20310      Light and spacious appartment in Amsterdam West
20311                                         Forest Queen
20312        Clean and modern family house in a quiet area
20313    Sunrise & Sunset View mins walk fr Central Sta...
20314     Cozy private room for couple in bright apartment
20315                   Amsterdam 3 bedroom full apartment
20316             In the Attic Canal House Small Apartment
20317         Luxury Canal Studio in City Center Amsterdam
20318                                         test listing
20319      Central, light apartment in Oud-West, Amsterdam
20320                    Cosy apartment in lovely old-west
20321        beautiful ground floor apartment with garden.
20322    Luxurious romantic apartment close to city centre
20323    Quaint Dutch house & garden, free parking & bikes
20324            Warm and refined two bedroom/home feeling
20325         Pink ins two bedroom / sweet and comfortable
20326          Flower Room American Vintage Warm Apartment
20327         Warm Sen apartment / giant screen projection
20328     Nordic style hardcover two-bedroom / elegant bed
20329         Artist's sincerity / elaborate arrangement /
20330               Exquisite and elegant fashion bed room
20331     Bamboo forest quiet high quality room / can cook
20332    Apartment Amsterdam next to city center (1 per...
20333             Penthouse in Watergraafsmeer (Amsterdam)
20334    Cozy room in Amsterdam with nice (Hidden by Ai...
20335          City apartment near centre, water and beach
20336                    Room in lovely central  apartment
Name: name, Length: 20337, dtype: object

Estudio sobre el conjunto de variables ¿cuál es el tipo de variables? ¿realmente la herramienta lee correctamente los datos?

A continuación, se realiza un primer análisis sobre el tipo de variables, para tal fin, se utiliza la función data.dtypes

In [47]:
data.dtypes
Out[47]:
id                                                int64
listing_url                                      object
scrape_id                                         int64
last_scraped                                     object
name                                             object
summary                                          object
space                                            object
description                                      object
experiences_offered                              object
neighborhood_overview                            object
notes                                            object
transit                                          object
access                                           object
interaction                                      object
house_rules                                      object
thumbnail_url                                   float64
medium_url                                      float64
picture_url                                      object
xl_picture_url                                  float64
host_id                                           int64
host_url                                         object
host_name                                        object
host_since                                       object
host_location                                    object
host_about                                       object
host_response_time                               object
host_response_rate                               object
host_acceptance_rate                            float64
host_is_superhost                                object
host_thumbnail_url                               object
                                                 ...   
has_availability                                 object
availability_30                                   int64
availability_60                                   int64
availability_90                                   int64
availability_365                                  int64
calendar_last_scraped                            object
number_of_reviews                                 int64
number_of_reviews_ltm                             int64
first_review                                     object
last_review                                      object
review_scores_rating                            float64
review_scores_accuracy                          float64
review_scores_cleanliness                       float64
review_scores_checkin                           float64
review_scores_communication                     float64
review_scores_location                          float64
review_scores_value                             float64
requires_license                                 object
license                                          object
jurisdiction_names                               object
instant_bookable                                 object
is_business_travel_ready                         object
cancellation_policy                              object
require_guest_profile_picture                    object
require_guest_phone_verification                 object
calculated_host_listings_count                    int64
calculated_host_listings_count_entire_homes       int64
calculated_host_listings_count_private_rooms      int64
calculated_host_listings_count_shared_rooms       int64
reviews_per_month                               float64
Length: 106, dtype: object

Hay algunas variables para las cuales toca tener mucho cuidado, dado que el tipo no es corresto.

In [79]:
variables[0:50]
Out[79]:
Index(['id', 'listing_url', 'scrape_id', 'last_scraped', 'name', 'summary',
       'space', 'description', 'experiences_offered', 'neighborhood_overview',
       'notes', 'transit', 'access', 'interaction', 'house_rules',
       'thumbnail_url', 'medium_url', 'picture_url', 'xl_picture_url',
       'host_id', 'host_url', 'host_name', 'host_since', 'host_location',
       'host_about', 'host_response_time', 'host_response_rate',
       'host_acceptance_rate', 'host_is_superhost', 'host_thumbnail_url',
       'host_picture_url', 'host_neighbourhood', 'host_listings_count',
       'host_total_listings_count', 'host_verifications',
       'host_has_profile_pic', 'host_identity_verified', 'street',
       'neighbourhood', 'neighbourhood_cleansed',
       'neighbourhood_group_cleansed', 'city', 'state', 'zipcode', 'market',
       'smart_location', 'country_code', 'country', 'latitude', 'longitude'],
      dtype='object')
In [78]:
variables[51:100]
Out[78]:
Index(['property_type', 'room_type', 'accommodates', 'bathrooms', 'bedrooms',
       'beds', 'bed_type', 'amenities', 'square_feet', 'price', 'weekly_price',
       'monthly_price', 'security_deposit', 'cleaning_fee', 'guests_included',
       'extra_people', 'minimum_nights', 'maximum_nights',
       'minimum_minimum_nights', 'maximum_minimum_nights',
       'minimum_maximum_nights', 'maximum_maximum_nights',
       'minimum_nights_avg_ntm', 'maximum_nights_avg_ntm', 'calendar_updated',
       'has_availability', 'availability_30', 'availability_60',
       'availability_90', 'availability_365', 'calendar_last_scraped',
       'number_of_reviews', 'number_of_reviews_ltm', 'first_review',
       'last_review', 'review_scores_rating', 'review_scores_accuracy',
       'review_scores_cleanliness', 'review_scores_checkin',
       'review_scores_communication', 'review_scores_location',
       'review_scores_value', 'requires_license', 'license',
       'jurisdiction_names', 'instant_bookable', 'is_business_travel_ready',
       'cancellation_policy', 'require_guest_profile_picture'],
      dtype='object')

A continuación, se va seleccionar un conjunto de variables que van a ser de interés para un análisis preliminar.

In [113]:
data_analisis=data.loc[:, ['id','room_type', 'bathrooms', 'bedrooms', 'bed_type', 'price', 'weekly_price',
       'monthly_price', 'security_deposit', 'city', 'state', 'country','first_review','last_review', 'review_scores_rating', 'number_of_reviews','has_availability', 'availability_30', 'availability_60',
       'availability_90', 'availability_365']]
In [114]:
data_analisis.head()
Out[114]:
id room_type bathrooms bedrooms bed_type price weekly_price monthly_price security_deposit city ... country first_review last_review review_scores_rating number_of_reviews has_availability availability_30 availability_60 availability_90 availability_365
0 2818 Private room 1.5 1.0 Real Bed $59.00 NaN $1,500.00 $200.00 Amsterdam ... Netherlands 2009-03-30 2019-06-28 98.0 262 t 5 21 35 107
1 20168 Private room 1.0 1.0 Real Bed $80.00 NaN NaN NaN Amsterdam ... Netherlands 2010-03-02 2019-07-08 88.0 279 t 0 11 32 140
2 25428 Entire home/apt 1.0 1.0 Real Bed $125.00 $650.00 $2,000.00 $300.00 Amsterdam ... Netherlands 2018-01-21 2019-05-11 100.0 3 t 0 4 6 106
3 27886 Private room 1.0 1.0 Real Bed $150.00 $810.00 $2,500.00 $0.00 Amsterdam ... Netherlands 2012-01-09 2019-07-01 99.0 195 t 0 6 14 74
4 28871 Private room NaN 1.0 Real Bed $75.00 $499.00 $1,956.00 NaN Amsterdam ... Netherlands 2010-08-22 2019-07-02 97.0 277 t 5 7 10 138

5 rows × 21 columns

In [115]:
data_analisis.dtypes
Out[115]:
id                        int64
room_type                object
bathrooms               float64
bedrooms                float64
bed_type                 object
price                    object
weekly_price             object
monthly_price            object
security_deposit         object
city                     object
state                    object
country                  object
first_review             object
last_review              object
review_scores_rating    float64
number_of_reviews         int64
has_availability         object
availability_30           int64
availability_60           int64
availability_90           int64
availability_365          int64
dtype: object

Se va corregir los signos pesos, para tal fin, se utilizará un conjunto de funciones

In [116]:
def eliminar_pesos(x):
    x['monthly_price']=str(x['monthly_price']).replace('$','')
    return x['monthly_price']

def eliminar_pesos_2(x):
    x['weekly_price']=str(x['weekly_price']).replace('$','')
    return x['weekly_price']

def eliminar_pesos_3(x):
    x['price']=str(x['price']).replace('$','')
    return x['price']
In [117]:
data_analisis['monthly_price']=data_analisis.apply(eliminar_pesos,axis=1)
data_analisis['weekly_price']=data_analisis.apply(eliminar_pesos_2,axis=1)
data_analisis['price']=data_analisis.apply(eliminar_pesos_3,axis=1)
In [118]:
data_analisis.head(100)
Out[118]:
id room_type bathrooms bedrooms bed_type price weekly_price monthly_price security_deposit city ... country first_review last_review review_scores_rating number_of_reviews has_availability availability_30 availability_60 availability_90 availability_365
0 2818 Private room 1.5 1.0 Real Bed 59.00 nan 1,500.00 $200.00 Amsterdam ... Netherlands 2009-03-30 2019-06-28 98.0 262 t 5 21 35 107
1 20168 Private room 1.0 1.0 Real Bed 80.00 nan nan NaN Amsterdam ... Netherlands 2010-03-02 2019-07-08 88.0 279 t 0 11 32 140
2 25428 Entire home/apt 1.0 1.0 Real Bed 125.00 650.00 2,000.00 $300.00 Amsterdam ... Netherlands 2018-01-21 2019-05-11 100.0 3 t 0 4 6 106
3 27886 Private room 1.0 1.0 Real Bed 150.00 810.00 2,500.00 $0.00 Amsterdam ... Netherlands 2012-01-09 2019-07-01 99.0 195 t 0 6 14 74
4 28871 Private room NaN 1.0 Real Bed 75.00 499.00 1,956.00 NaN Amsterdam ... Netherlands 2010-08-22 2019-07-02 97.0 277 t 5 7 10 138
5 29051 Private room 1.0 1.0 Real Bed 55.00 350.00 1,435.00 NaN Amsterdam ... Netherlands 2011-03-16 2019-06-29 95.0 428 t 2 7 9 173
6 31080 Entire home/apt 1.0 3.0 Real Bed 219.00 1,400.00 4,000.00 NaN Amsterdam ... Netherlands 2011-08-06 2017-10-16 95.0 32 t 0 0 0 0
7 38266 Entire home/apt 1.0 1.0 Futon 145.00 nan nan $300.00 Amsterdam ... Netherlands 2010-07-23 2019-05-19 99.0 199 t 0 0 0 131
8 41125 Entire home/apt 0.0 1.0 Real Bed 180.00 650.00 1,600.00 $150.00 Amsterdam ... Netherlands 2010-11-25 2019-06-23 96.0 85 t 0 0 0 0
9 42970 Private room 1.0 1.0 Real Bed 159.00 nan 3,000.00 $300.00 Amsterdam ... Netherlands 2010-09-06 2019-07-01 98.0 450 t 12 38 59 110
10 43109 Entire home/apt 1.0 2.0 Real Bed 210.00 2,100.00 4,500.00 NaN Amsterdam ... Netherlands 2013-11-12 2019-07-06 97.0 690 t 0 0 0 0
11 43980 Entire home/apt 1.0 1.0 Real Bed 100.00 680.00 1,250.00 $300.00 Amsterdam ... Netherlands 2010-11-01 2018-02-18 80.0 61 t 30 60 84 182
12 44391 Entire home/apt 1.0 2.0 Real Bed 240.00 nan nan $0.00 Amsterdam ... Netherlands 2010-09-16 2019-05-12 94.0 35 t 1 1 1 1
13 46386 Entire home/apt 1.0 1.0 Real Bed 150.00 nan nan $250.00 Amsterdam ... Netherlands 2010-09-16 2018-01-03 93.0 3 t 0 0 0 0
14 47061 Entire home/apt 1.0 2.0 Real Bed 140.00 720.00 2,200.00 $250.00 Amsterdam ... Netherlands 2010-09-13 2019-06-19 96.0 182 t 0 0 0 37
15 48076 Entire home/apt 2.0 3.0 Real Bed 350.00 nan nan $0.00 Amsterdam ... Netherlands 2011-10-02 2019-06-29 97.0 182 t 5 14 27 146
16 49552 Entire home/apt 1.0 2.0 Real Bed 220.00 nan nan $500.00 Amsterdam ... Netherlands 2010-10-29 2019-06-30 98.0 327 t 3 9 25 242
17 50518 Entire home/apt 1.0 1.0 Real Bed 125.00 nan nan $0.00 Amsterdam ... Netherlands 2012-10-24 2019-06-20 96.0 97 t 0 0 0 0
18 50523 Private room 1.0 1.0 Real Bed 115.00 640.00 2,300.00 NaN Amsterdam ... Netherlands 2011-01-04 2019-06-24 97.0 249 t 2 6 8 223
19 50570 Entire home/apt 1.0 1.0 Real Bed 90.00 485.00 1,500.00 $0.00 Amsterdam ... Netherlands 2011-03-31 2019-03-26 91.0 157 t 0 0 0 0
20 52490 Private room 1.5 1.0 Real Bed 72.00 550.00 1,400.00 $0.00 Amsterdam ... Netherlands 2010-11-03 2019-06-24 93.0 90 t 12 12 12 12
21 53067 Private room 1.0 1.0 Real Bed 87.00 nan nan $0.00 Amsterdam ... Netherlands 2010-11-21 2019-06-29 93.0 345 t 1 2 2 2
22 53671 Private room 1.0 1.0 Real Bed 75.00 480.00 1,300.00 $150.00 Amsterdam ... Netherlands 2011-04-12 2019-07-02 91.0 291 t 5 18 21 267
23 53692 Private room 1.0 1.0 Real Bed 60.00 nan nan $0.00 Amsterdam ... Netherlands 2011-06-04 2019-07-06 91.0 276 t 2 10 10 10
24 55256 Private room 1.0 1.0 Futon 86.00 500.00 1,700.00 $0.00 Amsterdam ... Netherlands 2011-02-20 2019-06-22 92.0 161 t 2 24 47 137
25 55621 Entire home/apt 1.0 2.0 Real Bed 222.00 1,000.00 nan $200.00 Amsterdam ... Netherlands 2010-11-19 2019-06-14 95.0 30 t 0 0 0 3
26 55703 Entire home/apt 1.0 3.0 Real Bed 250.00 800.00 3,000.00 $500.00 Amsterdam ... Netherlands 2015-08-10 2016-10-19 100.0 3 t 0 0 0 189
27 55709 Entire home/apt 1.0 1.0 Real Bed 159.00 895.00 2,195.00 NaN Amsterdam ... Netherlands 2010-12-06 2019-03-27 100.0 53 t 1 1 1 1
28 55807 Private room 1.0 1.0 Real Bed 60.00 nan nan NaN Amsterdam ... Netherlands 2010-11-16 2019-06-23 97.0 156 t 4 4 4 102
29 55868 Entire home/apt 1.5 2.0 Real Bed 149.00 nan nan $300.00 Amsterdam ... Netherlands 2011-01-03 2019-05-31 97.0 90 t 3 3 3 3
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
70 152280 Private room 1.0 1.0 Real Bed 89.00 nan nan $100.00 Amsterdam ... Netherlands 2011-07-03 2019-04-27 88.0 589 t 0 0 0 0
71 158466 Private room 1.0 1.0 Real Bed 79.00 nan nan $100.00 Amsterdam ... Netherlands 2011-08-05 2019-06-30 84.0 530 t 0 0 0 0
72 162467 Entire home/apt 1.0 1.0 Real Bed 145.00 nan nan NaN Amsterdam ... Netherlands 2013-01-22 2019-06-22 95.0 53 t 0 1 1 191
73 165833 Entire home/apt 1.0 2.0 Real Bed 74.00 nan nan $200.00 Amsterdam ... Netherlands 2014-09-16 2018-05-23 97.0 37 t 0 0 0 180
74 168769 Private room 1.5 1.0 Real Bed 115.00 nan nan $0.00 Amsterdam ... Netherlands 2011-09-28 2019-07-05 91.0 235 t 10 28 28 28
75 169356 Entire home/apt 1.0 1.0 Real Bed 165.00 700.00 nan $500.00 Amsterdam ... Netherlands 2011-09-15 2019-04-26 100.0 19 t 0 11 11 32
76 171054 Entire home/apt 1.0 2.0 Real Bed 120.00 650.00 1,554.00 $100.00 Amsterdam ... Netherlands 2011-07-18 2018-11-28 95.0 103 t 0 0 0 0
77 171631 Entire home/apt 1.0 3.0 Real Bed 89.00 nan nan $250.00 Amsterdam ... Netherlands 2011-07-25 2018-04-03 96.0 45 t 29 59 74 294
78 175989 Entire home/apt 1.0 2.0 Real Bed 100.00 800.00 2,500.00 $100.00 Amsterdam ... Netherlands 2011-09-21 2019-06-02 97.0 101 t 0 0 0 0
79 176423 Entire home/apt 1.0 1.0 Real Bed 85.00 nan nan $175.00 Amsterdam ... Netherlands 2011-09-13 2017-09-20 90.0 63 t 12 12 12 12
80 179528 Entire home/apt 1.0 2.0 Real Bed 230.00 nan 2,751.00 $300.00 Amsterdam ... Netherlands 2011-09-11 2017-03-03 89.0 26 t 0 0 6 249
81 182839 Entire home/apt 1.0 2.0 Real Bed 125.00 825.00 3,000.00 $150.00 Amsterdam ... Netherlands 2012-03-26 2012-07-26 98.0 15 t 0 0 0 0
82 184282 Entire home/apt 1.0 2.0 Real Bed 230.00 1,300.00 3,000.00 NaN Amsterdam ... Netherlands 2011-11-15 2019-06-23 91.0 181 t 2 14 26 288
83 188347 Private room 1.0 1.0 Real Bed 50.00 nan nan $150.00 Amsterdam ... Netherlands 2011-08-24 2019-06-26 93.0 341 t 1 13 13 13
84 189754 Private room 1.0 1.0 Real Bed 160.00 851.00 3,200.00 NaN Amsterdam ... Netherlands 2011-09-17 2019-07-05 95.0 158 t 3 5 17 245
85 190943 Entire home/apt 1.0 2.0 Real Bed 150.00 nan nan $250.00 Amsterdam ... Netherlands 2012-08-06 2019-05-07 86.0 10 t 12 14 14 14
86 191404 Entire home/apt 1.0 2.0 Real Bed 195.00 1,000.00 nan $0.00 Amsterdam ... Netherlands 2011-10-05 2019-05-05 100.0 7 t 3 3 3 3
87 193038 Private room 1.0 1.0 Real Bed 95.00 575.00 2,250.00 NaN Amsterdam ... Netherlands 2011-09-12 2019-06-21 99.0 539 t 1 4 8 231
88 198307 Entire home/apt 1.5 1.0 Real Bed 129.00 840.00 nan $250.00 Amsterdam ... Netherlands 2011-08-29 2017-05-03 99.0 37 t 2 2 2 2
89 199844 Entire home/apt 1.0 1.0 Real Bed 99.00 nan nan $1,000.00 Amsterdam ... Netherlands 2011-09-14 2017-05-02 96.0 42 t 0 0 0 0
90 200461 Entire home/apt 1.0 1.0 Real Bed 79.00 nan 1,290.00 $325.00 Amsterdam ... Netherlands 2011-08-29 2019-02-20 87.0 36 t 3 3 3 3
91 203658 Entire home/apt 1.5 3.0 Real Bed 284.00 nan nan $300.00 Amsterdam ... Netherlands 2012-01-04 2017-07-02 97.0 24 t 0 0 0 0
92 205759 Entire home/apt 1.0 1.0 Real Bed 99.00 560.00 nan NaN Amsterdam ... Netherlands 2015-07-20 2015-07-27 90.0 2 t 0 0 0 0
93 212050 Entire home/apt 1.5 2.0 Real Bed 180.00 nan nan NaN Amsterdam ... Netherlands 2011-10-21 2019-04-29 94.0 16 t 12 15 15 204
94 213371 Entire home/apt 1.5 3.0 Pull-out Sofa 400.00 1,800.00 4,800.00 $200.00 Amsterdam ... Netherlands 2011-09-13 2019-06-30 96.0 18 t 22 52 59 59
95 214531 Private room 1.0 1.0 Real Bed 114.00 nan nan $0.00 Amsterdam ... Netherlands 2011-11-07 2019-05-29 96.0 158 t 0 0 0 223
96 217692 Entire home/apt 1.0 2.0 Real Bed 150.00 nan 2,499.00 $250.00 Amsterdam ... Netherlands 2014-09-17 2019-02-09 97.0 15 t 29 59 83 272
97 219276 Private room 1.0 1.0 Real Bed 265.00 nan nan $500.00 Amsterdam ... Netherlands 2012-07-31 2019-04-21 100.0 2 t 7 37 56 319
98 221922 Private room 1.0 1.0 Real Bed 99.00 nan nan $100.00 Bos en Lommer ... Netherlands 2011-11-07 2016-03-08 96.0 6 t 18 37 37 37
99 221943 Entire home/apt 1.0 2.0 Real Bed 170.00 1,200.00 3,500.00 $200.00 Amsterdam ... Netherlands 2011-10-12 2019-06-24 94.0 197 t 2 11 27 273

100 rows × 21 columns

Para las variables "price", "weekly_price" y "monthly_price" se va a eliminar el caracter "," , para finalmente corregirlo.

In [119]:
def eliminar_coma(x):
    x['monthly_price']=str(x['monthly_price']).replace(',','')
    return x['monthly_price']

def eliminar_coma_2(x):
    x['weekly_price']=str(x['weekly_price']).replace(',','')
    return x['weekly_price']

def eliminar_coma_3(x):
    x['price']=str(x['price']).replace(',','')
    return x['price']
In [120]:
data_analisis['monthly_price']=data_analisis.apply(eliminar_coma,axis=1)
data_analisis['weekly_price']=data_analisis.apply(eliminar_coma,axis=1)
data_analisis['price']=data_analisis.apply(eliminar_coma,axis=1)
In [121]:
data_analisis.head()
Out[121]:
id room_type bathrooms bedrooms bed_type price weekly_price monthly_price security_deposit city ... country first_review last_review review_scores_rating number_of_reviews has_availability availability_30 availability_60 availability_90 availability_365
0 2818 Private room 1.5 1.0 Real Bed 1500.00 1500.00 1500.00 $200.00 Amsterdam ... Netherlands 2009-03-30 2019-06-28 98.0 262 t 5 21 35 107
1 20168 Private room 1.0 1.0 Real Bed nan nan nan NaN Amsterdam ... Netherlands 2010-03-02 2019-07-08 88.0 279 t 0 11 32 140
2 25428 Entire home/apt 1.0 1.0 Real Bed 2000.00 2000.00 2000.00 $300.00 Amsterdam ... Netherlands 2018-01-21 2019-05-11 100.0 3 t 0 4 6 106
3 27886 Private room 1.0 1.0 Real Bed 2500.00 2500.00 2500.00 $0.00 Amsterdam ... Netherlands 2012-01-09 2019-07-01 99.0 195 t 0 6 14 74
4 28871 Private room NaN 1.0 Real Bed 1956.00 1956.00 1956.00 NaN Amsterdam ... Netherlands 2010-08-22 2019-07-02 97.0 277 t 5 7 10 138

5 rows × 21 columns

A pesar de eliminar el caracter ",", el tipo de variable de las variables "price", "weekly_price" y "monthly_price" siguen siendo carácteres.

In [122]:
data_analisis.dtypes
Out[122]:
id                        int64
room_type                object
bathrooms               float64
bedrooms                float64
bed_type                 object
price                    object
weekly_price             object
monthly_price            object
security_deposit         object
city                     object
state                    object
country                  object
first_review             object
last_review              object
review_scores_rating    float64
number_of_reviews         int64
has_availability         object
availability_30           int64
availability_60           int64
availability_90           int64
availability_365          int64
dtype: object

A continuación, se corrige el tipo de datos de las variables "price", "weekly_price" y "monthly_price", para tal fin se utiliza la función astype(tipo_variable) de pandas.

In [123]:
data_analisis["price"]=data_analisis["price"].astype(float)
data_analisis["weekly_price"]=data_analisis["price"].astype(float)
data_analisis["monthly_price"]=data_analisis["price"].astype(float)
In [124]:
data_analisis.head()
Out[124]:
id room_type bathrooms bedrooms bed_type price weekly_price monthly_price security_deposit city ... country first_review last_review review_scores_rating number_of_reviews has_availability availability_30 availability_60 availability_90 availability_365
0 2818 Private room 1.5 1.0 Real Bed 1500.0 1500.0 1500.0 $200.00 Amsterdam ... Netherlands 2009-03-30 2019-06-28 98.0 262 t 5 21 35 107
1 20168 Private room 1.0 1.0 Real Bed NaN NaN NaN NaN Amsterdam ... Netherlands 2010-03-02 2019-07-08 88.0 279 t 0 11 32 140
2 25428 Entire home/apt 1.0 1.0 Real Bed 2000.0 2000.0 2000.0 $300.00 Amsterdam ... Netherlands 2018-01-21 2019-05-11 100.0 3 t 0 4 6 106
3 27886 Private room 1.0 1.0 Real Bed 2500.0 2500.0 2500.0 $0.00 Amsterdam ... Netherlands 2012-01-09 2019-07-01 99.0 195 t 0 6 14 74
4 28871 Private room NaN 1.0 Real Bed 1956.0 1956.0 1956.0 NaN Amsterdam ... Netherlands 2010-08-22 2019-07-02 97.0 277 t 5 7 10 138

5 rows × 21 columns

Finalmente, se obtiene el número de variables y de filas del dataframe

In [126]:
data_analisis.shape
Out[126]:
(20337, 21)

El primer análisis sobre la tabla.

Para realizar un primer análisis exploratorio sobre la tabla, se utiliza la función pandas_profiling, el cual permite evidenciar el tipo de variable

In [127]:
pandas_profiling.ProfileReport(data_analisis)
Out[127]:

Supongamos, ahora, que vamos a realizar un análisis sobre "bed_type== Real Bed"., entonces:

In [130]:
consulta_1=data_analisis[data_analisis.bed_type=="Real Bed"]
consulta_1.head()
Out[130]:
id room_type bathrooms bedrooms bed_type price weekly_price monthly_price security_deposit city ... country first_review last_review review_scores_rating number_of_reviews has_availability availability_30 availability_60 availability_90 availability_365
0 2818 Private room 1.5 1.0 Real Bed 1500.0 1500.0 1500.0 $200.00 Amsterdam ... Netherlands 2009-03-30 2019-06-28 98.0 262 t 5 21 35 107
1 20168 Private room 1.0 1.0 Real Bed NaN NaN NaN NaN Amsterdam ... Netherlands 2010-03-02 2019-07-08 88.0 279 t 0 11 32 140
2 25428 Entire home/apt 1.0 1.0 Real Bed 2000.0 2000.0 2000.0 $300.00 Amsterdam ... Netherlands 2018-01-21 2019-05-11 100.0 3 t 0 4 6 106
3 27886 Private room 1.0 1.0 Real Bed 2500.0 2500.0 2500.0 $0.00 Amsterdam ... Netherlands 2012-01-09 2019-07-01 99.0 195 t 0 6 14 74
4 28871 Private room NaN 1.0 Real Bed 1956.0 1956.0 1956.0 NaN Amsterdam ... Netherlands 2010-08-22 2019-07-02 97.0 277 t 5 7 10 138

5 rows × 21 columns

Supongamos, ahora, que vamos a realizar un análisis sobre "bed_type== Real Bed" y number_of_reviews >23, entonces:

In [134]:
consulta_2=data_analisis[(data_analisis.bed_type=="Real Bed") & (data_analisis.number_of_reviews >23)]
In [135]:
consulta_2.head()
Out[135]:
id room_type bathrooms bedrooms bed_type price weekly_price monthly_price security_deposit city ... country first_review last_review review_scores_rating number_of_reviews has_availability availability_30 availability_60 availability_90 availability_365
0 2818 Private room 1.5 1.0 Real Bed 1500.0 1500.0 1500.0 $200.00 Amsterdam ... Netherlands 2009-03-30 2019-06-28 98.0 262 t 5 21 35 107
1 20168 Private room 1.0 1.0 Real Bed NaN NaN NaN NaN Amsterdam ... Netherlands 2010-03-02 2019-07-08 88.0 279 t 0 11 32 140
3 27886 Private room 1.0 1.0 Real Bed 2500.0 2500.0 2500.0 $0.00 Amsterdam ... Netherlands 2012-01-09 2019-07-01 99.0 195 t 0 6 14 74
4 28871 Private room NaN 1.0 Real Bed 1956.0 1956.0 1956.0 NaN Amsterdam ... Netherlands 2010-08-22 2019-07-02 97.0 277 t 5 7 10 138
5 29051 Private room 1.0 1.0 Real Bed 1435.0 1435.0 1435.0 NaN Amsterdam ... Netherlands 2011-03-16 2019-06-29 95.0 428 t 2 7 9 173

5 rows × 21 columns

A continuación, se ejecuta pandas profiling sobre el dataframe denominado consulta_2

In [136]:
pandas_profiling.ProfileReport(consulta_2)
Out[136]:

Validaciones con duplicados

Las validaciones con duplicados se pueden realizar con el siguiente código:

In [137]:
consulta_2.duplicated().sum()
Out[137]:
0
In [140]:
##Para eliminar duplicados (en caso que existan)
consulta_2=consulta_2.drop_duplicates()

Preguntas.

  1. Realizar un "enriquecimiento" de información, utilizando los datos que se encuentra en la página http://insideairbnb.com/get-the-data.html, para tal fin, utilice la funciòn merge (http://www.datasciencemadesimple.com/join-merge-data-frames-pandas-python/).
  2. Seleccione un conjunto de variables que pueda ser de interés.
  3. Teniendo en cuenta los resultados del ítem 2, realizar un análisis y el tratamiendo adecuado en función del tipo de variables, datos faltantes y datos duplicados.
  4. Formule 4 hipótesis que se puedan abarcar sobre el conjunto de datos resultantes en el punto anterior, y de ser prosible dar respuesta a estas utilizando pandas profiling.
In [ ]: